Skip to content

osctrl-tls: hardening — constant-time secrets, per-IP enroll rate-limit, audit-log on failed enrolls (round 1b of 3)#816

Open
alvarofraguas wants to merge 2 commits into
jmpsec:mainfrom
alvarofraguas:pr/round-1b-osctrl-tls-hardening
Open

osctrl-tls: hardening — constant-time secrets, per-IP enroll rate-limit, audit-log on failed enrolls (round 1b of 3)#816
alvarofraguas wants to merge 2 commits into
jmpsec:mainfrom
alvarofraguas:pr/round-1b-osctrl-tls-hardening

Conversation

@alvarofraguas
Copy link
Copy Markdown
Contributor

Carves the osctrl-tls-facing changes out of #813 into their own PR, as requested in #813 (comment), so the TLS surface can be reviewed and tested in isolation.

Stack order: depends on #813. The shared infrastructure consumed here — pkg/auditlog.FailedEnroll, pkg/ratelimit, utils.SetTrustedProxies / updated GetIP — ships in #813. If you read the diff here against main it will include those #813 changes; the carve-out is the 5 files under cmd/tls/ + deploy/config/tls.yml. Once #813 merges this PR's diff drops to those 5 files only.

What this changes

5 files in cmd/tls/ + deploy/config/tls.yml. Three independent defenses on the only osctrl service that is internet-facing in a typical deployment:

1. Constant-time secret comparison

cmd/tls/handlers/utils.gocheckValidSecret and checkValidRemovePath use subtle.ConstantTimeCompare instead of the previous byte-by-byte short-circuiting compare. Removes a response-time timing oracle on the public enroll endpoint.

2. Per-IP rate-limit on /enroll

cmd/tls/main.go — 20 req/min per remote IP, idle eviction after 10 min, backed by pkg/ratelimit (from #813). Rejected requests:

  • return 429
  • record AuditLog.FailedEnroll(ip, env, "rate limit exceeded", 0) so SoC tooling can alert

3. Audit-log on failed enrollment

  • cmd/tls/handlers/handlers.goHandlersTLS gains an AuditLog field + WithAuditLog option; defensive default is a disabled manager so handler code can call h.AuditLog.FailedEnroll(...) without nil-checks.
  • cmd/tls/handlers/post.goEnrollHandler records FailedEnroll on the invalid-secret path.
  • cmd/tls/main.go — initializes auditlog.CreateAuditLogManager and threads it via WithAuditLog. Service name osctrl-tls flows through.

4. Trusted-proxies plumbing

cmd/tls/main.go — reads flagParams.Service.TrustedProxies (CIDR list) and calls utils.SetTrustedProxies before any request reaches GetIP. Empty default = GetIP ignores X-Forwarded-For / X-Real-IP, preventing internet-side header-spoofed bypass of the rate-limiter or audit-log poisoning.

Config

deploy/config/tls.yml:

  • auditLog: true (on by default; was false)
  • trustedProxies: "" (empty default; operators behind a trusted reverse proxy can list CIDRs)

Verified

  • go build ./... clean
  • go vet ./... clean
  • go test ./cmd/tls/... green
  • Live smoke on a dev stack: 4 osquery nodes enrolled, audit_logs records enroll_failure rows with correct IPs for forced bad-secret attempts

Test plan

  • CI: lint + build + tests pass once workflows are approved
  • Manual: with trustedProxies empty, confirm X-Forwarded-For: 1.2.3.4 doesn't override RemoteAddr in audit_logs
  • Manual: trigger >20 enroll requests from the same IP in <1 min, confirm 429 + audit-log entry
  • Manual: bad enroll-secret produces an enroll_failure audit-log row with the source IP

…, shared rate-limit + audit-log infra

Server-side hardening for osctrl-api, plus shared infrastructure
(rate-limit package, audit-log helpers, trusted-proxies plumbing)
that osctrl-tls also consumes — its consumer-side changes ship in a
companion PR so the TLS-facing surface can be tested in isolation.

== Auth bedrock ==

cmd/api:
  - --auth=jwt is now the default. Refuse to start with --auth=none
    unless OSCTRL_INSECURE_NO_AUTH=1 is set. When opted in, a 60s
    warning ticker keeps the deployment from drifting into
    'auth-off forever'.
  - HttpOnly + Secure cookie session for SPA-style clients
    (osctrl_token). CLI clients with Authorization: Bearer continue
    to work unchanged.
  - Double-submit CSRF (osctrl_csrf cookie + X-CSRF-Token header) for
    mutating cookie-authenticated requests. CLI Bearer flows exempt.
  - JWT signing-algorithm pin (HMAC only) to defeat alg-confusion
    attacks (alg:none / RS256-with-HS256-verify).
  - JWT secret minimum 32 bytes (HS256 needs HMAC key ≥ hash output).
    Startup fails fast with the openssl one-liner if too short.
  - Strict 'forwarded headers' trust via --trusted-proxies. Empty
    default means utils.GetIP ignores X-Forwarded-For / X-Real-IP —
    an internet attacker can't spoof IPs to defeat rate-limits or
    poison audit logs.

== Env secret containment + cross-env defense ==

pkg/types: new TLSEnvironmentView — the low-privilege env projection.
  Omits Secret, EnrollSecretPath, RemoveSecretPath, Certificate, Flags,
  and every other field that materially contributes to enrolling a node.

cmd/api/handlers/environments.go:
  - EnvironmentHandler now branches on access level: AdminLevel (or
    super-admin) gets the full storage struct; UserLevel gets the
    low-priv view.
  - EnvEnrollHandler / EnvRemoveHandler raised from UserLevel to
    AdminLevel — both embed the env's enroll/remove secret.
  - Both handlers log only the target name, not returnData.
  - EnvActionsHandler 'create' branch validates caller-supplied UUID
    via EnvUUIDFilter (rejects malformed) and EnvExists (rejects
    collision). 'delete' branch gets the same validation for symmetry.

cmd/api/handlers/queries.go: QueryResultsHandler now precheck-validates
  the named query belongs to env.ID via h.Queries.Exists(name, env.ID)
  and returns 404 otherwise. logging.GetQueryResults filtered on 'name'
  only, so without this gate a user with QueryLevel on env A could
  pull results from env B by passing B's query name in A's URL.

pkg/environments/environments.go: tighten EnvUUIDFilter regex and add
  axis-pure Exists/UUIDExists helpers so handler checks can match the
  router's expectations exactly.

== Shared rate-limit + audit-log infrastructure ==

pkg/ratelimit (new): per-key token-bucket rate limiter with idle
  eviction. Used by osctrl-api for /login here, and by osctrl-tls for
  /enroll in the companion PR. Tunable burst, window, and key
  function (KeyByIP today; KeyByIPAndEnv available).

pkg/auditlog/audit.go: FailedLogin + FailedEnroll helpers — a clean
  stream of authn/enrol failures for SoC tooling to alert on
  brute-force, password-spray, and enroll abuse.

pkg/utils/http-utils.go: SetTrustedProxies + an updated GetIP that
  honors the trusted-proxies set. Empty (default) ignores
  X-Forwarded-For / X-Real-IP entirely.

== SQL hardening + carve path safety ==

pkg/carves/utils.go: new ValidCarvePath regexp gate. Without this gate
  a CarveLevel operator could pass \`'; SELECT 1; --\` and pivot 'carve
  a file' into 'run any SELECT against your fleet' via GenCarveQuery's
  string concat.

cmd/api/handlers/carves.go (CarvesRunHandler): path validated before
  the SQL splice. Rejected paths return 400.

== Authz + audit-log hardening ==

pkg/users:
  - bcrypt cost raised from default (10) to 12. CheckLoginCredentials
    opportunistically re-hashes existing users at next login (no
    password reset needed). Rehash failure is non-fatal.
  - New ClearToken empties APIToken AND CSRFToken so any existing JWT
    + CSRF cookie pair stops validating. Used by future
    DELETE /api/v1/users/{username}/token in a follow-up PR.

cmd/api/handlers/{users,settings,environments}.go: authz tightenings
  around permission writes, settings PATCH, and env-action service-name
  validation.

pkg/environments/env-cache.go: keep the 2h cleanup interval; introduce
  an envCacheTTL constant so the value is self-documenting and tunable
  locally without changing runtime defaults.

== Defaults + ops ==

deploy/config/{api,admin}.yml: flip --audit-log default to true so
  audit log writes are on by default. Operators can disable with
  --audit-log=false.

Verified: go build ./... clean, go vet ./... clean, go test ./pkg/...
./cmd/api/... ./cmd/tls/... all green.
…it, audit-log on failed enrolls

Carves the osctrl-tls-facing changes out of the original "round 1"
security PR (jmpsec#813) into their own PR so the TLS surface
can be isolated for review and testing. Depends on the shared
infrastructure (pkg/auditlog FailedEnroll, pkg/ratelimit, and
utils.SetTrustedProxies / GetIP changes) that lands in jmpsec#813.

== Threats addressed ==

osctrl-tls is the only osctrl service that is internet-facing in a
typical deployment — every osquery node POSTs to /enroll, /config,
/log, etc. before any authentication exists. That makes its surface
the most exposed and the most worth hardening.

This PR adds three independent defenses:

== 1. Constant-time secret comparison ==

cmd/tls/handlers/utils.go: checkValidSecret and checkValidRemovePath
  switched from `strings.TrimSpace(secret) == env.Secret` (byte-by-byte
  short-circuiting compare) to `subtle.ConstantTimeCompare`. The old
  form leaked secret bytes via response-time timing — an attacker on
  the open internet could iterate one byte at a time. Constant-time
  compare eliminates that oracle.

== 2. Per-IP rate-limit on /enroll ==

cmd/tls/main.go: 20 requests/minute per remote IP, idle eviction after
  10 minutes. Rejected requests:
    - return 429
    - record an `AuditLog.FailedEnroll(ip, env, "rate limit exceeded", 0)`
      so SoC tooling can alert on enroll abuse / spray.
  Backed by pkg/ratelimit (from jmpsec#813). The default 20/min easily covers
  any realistic enroll burst (a wave of provisioning at site bring-up)
  while shutting down brute-force / DoS attempts.

== 3. Audit-log on failed enrollment ==

cmd/tls/handlers/handlers.go: HandlersTLS grows an AuditLog field +
  WithAuditLog option. Defensive default is a disabled manager so
  handler code can call h.AuditLog.FailedEnroll(...) without nil checks.

cmd/tls/handlers/post.go: EnrollHandler records FailedEnroll on the
  invalid-secret path. Combined with the rate-limit failures above,
  the audit log now carries the full picture of unwanted enroll
  traffic.

cmd/tls/main.go: initializes auditlog.CreateAuditLogManager and
  threads it into handlers via WithAuditLog. Service name "osctrl-tls"
  flows through to audit rows.

== 4. Trusted-proxies plumbing ==

cmd/tls/main.go: reads flagParams.Service.TrustedProxies (CIDR list)
  and calls utils.SetTrustedProxies before any request can reach
  GetIP. Empty default means GetIP ignores X-Forwarded-For /
  X-Real-IP and uses RemoteAddr — preventing internet-side
  header-spoofed bypass of the rate-limiter or audit-log poisoning.

== Config ==

deploy/config/tls.yml:
  - auditLog: true (on by default; was false)
  - trustedProxies: ""  (empty by default; operators behind a trusted
    reverse proxy can list the proxy CIDRs)

Verified: go build ./... clean, go vet ./... clean, go test ./cmd/tls/...
green. Live smoke on the dev stack: 4 osquery nodes enrolled, the
audit_logs table records `enroll_failure` rows with correct IPs for
forced bad-secret attempts.
@javuto javuto self-assigned this May 14, 2026
@javuto javuto added ✨ enhancement New feature or request osctrl-tls osctrl-tls related changes 🔐 security Security related issues labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

✨ enhancement New feature or request osctrl-tls osctrl-tls related changes 🔐 security Security related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants